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Abstract 

This paper addresses the problem of nonlinear multivariate root finding. In an earlier paper we describe 
a system called Newton which finds roots of systems of nonlinear equations using refinements of interval 
methods. The refinements are inspired by AI constraint propagation techniques. Newton is competitive 
with continuation methods on most benchmarks and can handle a variety of cases that are infeasible for 
continuation methods. This paper presents three “cuts” which we believe capture the essential theoretical 
ideas behind the success of Newton. This paper describes the cuts in a concise and abstract manner which, 
we believe, makes the theoretical content of our work more apparent. Any implementation will need to 
adopt some heuristic control mechanism. Heuristic control of the cuts is only briefly discussed here. 
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1 Introduction 

In this paper we address the problem of finding solu¬ 
tions to large systems of nonlinear equations. This is 
an old problem with many applications and a large lit¬ 
erature. The problem of of determining the existence 
of a solution to algebraic constraints on real numbers is 
known to be NP-hard and to be in PSPACE but it is 
not known whether this problem is in NP. In engineer¬ 
ing applications it is generally sufficient to consider a 
simpler problem — the problem of determining the ex¬ 
istence of floating point numbers that satisfy the given 
constraints up to the accuracy of floating point compu¬ 
tations. This problem is easily shown to be NP-hard. 
However, since one can always guess the floating point 
numbers involved, floating point constraint satisfaction 
is in NP and hence NP-complete. Our work on floating 
point constraints emphasizes refinements of constraint 
propagation techniques for NP complete problems that 
have been developed in the AI and logic programming 
communities [1, 2, 3]. 

In an earlier paper we describe an implemented sys¬ 
tem called Newton [4]. Newton uses a branch and prop¬ 
agate algorithm whose propagation phase manipulates 
upper and lower bounds on the variables appearing in 
the given equations. Algorithms which manipulate up¬ 
per and lower bounds are called interval methods — 
Newton is based on interval methods [5]. A well-studied 
alternative to interval methods is the so-called “continu¬ 
ation method” or “homotopy technique” [6, 14]. Table 1 
summarizes our previously published results comparing 
Newton with earlier interval based systems and with 
continuation methods. 1 The benchmarks were taken 
from papers on numerical analysis [12], interval analy¬ 
sis [8, 9, 11], and continuation methods [14, 6, 13, 10]. 
The table shows that Newton outperforms an earlier in¬ 
terval based system and is competitive with continuation 
methods on the benchmarks tried. The novelty of New¬ 
ton resides in the use of bound propagation techniques 
which are stronger than those used in previous interval 
systems. Newton can solve a variety of problems for 
which continuation methods are infeasible (those bench- 

J The Newton system was run on a Sun Sparc 10 work¬ 
station to obtain all solutions to each benchmark system of 
equations. The column labeled Newton gives the run time (in 
seconds) of the Newton system on a given benchmark prob¬ 
lem. The column labeled HRB gives timings for a traditional 
interval method using Hansen-Segupta’s operator, range test¬ 
ing, and branching. This method uses the same implemen¬ 
tation technology as ours, C code running on a SPARC 10. 
Some interval methods such as [7] have better convergence 
properties near solutions. Our main contribution aims at 
speeding up the computation when far from a solution and 
hence comparing it to HRB is meaningful. The column labeled 
CONT gives timings for a state of the art continuation method 
[14] running on a DEC 5000/200 which is perhaps slightly 
slower than a SPARC 10. For each benchmark, we give the 
number of variables (ra), the total degree of the system (d), 
the initial range for the variables, and the run time of each 
system in seconds. A space in a column means that the re¬ 
sult is not available for the method. A question mark means 
that the method does not terminate in a reasonable time (> 
1 hour). Further details on these timings can be found in [4]. 
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Table 1: Summary of the Experimental Results 


marks with total degree greater than, say, 2 40 ). 

Our earlier paper on Newton describes the algorithm 
used in sufficient detail to allow replication of the empiri¬ 
cal results [4]. This includes a detailed description of the 
heuristics used. The details tend to obscure what we feel 
are the essential technical ideas behind the algorithms. 
In this paper we give an abstract presentation of three 
“cuts” where each cut specifies a way of inferring a new 
upper or lower bound on a variable. These abstract cuts, 
while insufficient in themselves to allow exact replication 
of our results, seem to capture the essence of Newton’s 
ability to prune the search space and a description of 
the cuts should allow others to replicate the qualitative 
performance of the Newton system. 

The heart of Newton’s algorithm is a propagation pro¬ 
cess which iteratively improves known bounds on vari¬ 
ables using a set of inference rules we call interval cuts. 
An interval cut is a method of inferring an inequality 
of the form x < b or x > b where & is a floating point 
number. The next section specifies three abstract cuts 
used in the Newton system. Section 3 briefly outlines 
how these cuts can be used in a branch and propagate 
procedure for findings all roots of a system of nonlinear 
equations. Section 3 is a simplification of the algorithm 
actually used in Newton. 

2 Three Cuts 

In order to define our cuts some preliminary terminol¬ 
ogy is needed. Our algorithms work with floating point 
numbers which we assume to be a finite set of ratio¬ 
nal numbers extended with both a positive and nega¬ 
tive infinity, denoted +oo and —oo respectively. We will 




generally use the term “interval” to refer to a closed 
interval of the form [a, b] where a and b are floating 
point numbers. Our algorithms also work with symbolic 
expressions constructed from floating point constants, 
symbolic variables, addition, subtraction, multiplication, 
and “power expressions” of the form e n where e is an 
expression and n is a positive integer constant. Power 
expressions improve the accuracy of interval evaluation. 
We will generally use the term “expression” to refer to 
symbolic expressions of this form. We will use the term 
“box” to refer to an assignment of intervals to symbolic 
variables. For box B and variable x we write B(x) to 
denote the interval assigned to 0 in the box B. Clearly, 
if we are working with n variables then a box is a recti¬ 
linear subset of R n . We also use B[x := [a, b ]] to denote 
the box which is identical to B except that it assigns the 
variable x the interval [a, b], A “point” is an assignment 
of particular floating point numbers to each variable. We 
allow B(x) to be interval of the form [a, a]. A point can 
be viewed as a box in which all intervals are of this form. 

All interval methods are based on interval evalua¬ 
tion — an evaluation process which computes an inter¬ 
val value for an expression from given intervals for the 
variables it contains. The most straightforward interval 
evaluation method is obtained by associating each of the 
operations of addition, subtraction, multiplication, and 
powers with a corresponding operation from intervals to 
intervals. More specifically we have the following opera¬ 
tions on intervals. 



Figure 1: A Newton cut. deriving x < c 1 — 


An interval cut. is a. procedure which operates on a. 
set. of constraints and a. current, box. An interval cut. 
reduces the given box by deriving a. new bound on one 
of the variables. Cuts are presented below as nondeter- 
minist.ic procedures — performing a. cut. often involves 
guessing one or more values. In practice these guesses 
are generated by deterministic heuristic methods such 
as those described in section 3. A nondet.erminist.ic cut. 
procedure can also be viewed as an inference rule for de¬ 
riving new bounds independent, of heuristic methods for 
finding useful inferences. 


[a, b] + [c, d] = [a. + c, b + d] 

[a, b] — [c, d] = [a. — d, b — c] 

[a, b] * [c, d] = [ min(ac, ad, be, bd), max(ac, ad, bebd )] 
f [a", b n ] n odd 

[a, b] n = < [0,roa.i((i" , 6")] n even; 0 £ [a, b] 

\min\a n ,b n ), m-ax(a n ,b n )] n even; 0 0 [a, b] 

We use the notation NE(e, B) t.o denote the inter¬ 
val value of an expression e when variables are assigned 
the interval values given in B and operations are inter¬ 
preted as operations on intervals according to the above 
rules (NE stands for “natural evaluation”). For exam¬ 
ple if B(x) is [—1, 1] then NE(x + x, B) is the inter¬ 
val [—2, 2]; NE(x * x, B) is the interval [—1, 1]; but. 
NE(x 2 , B) is [0, 1]. For most, nontrivial expressions 
the interval NE(e, B) is larger than the actual range of 
values of e over points in B. If B(x) is [— 1, 1] we have 
that NE(x 2 — x, B) is [—1, 2] while the actual range of 
values in this box is [—1/4, 2]. In our implementation 
outward rounding is used in the above rules — in arith¬ 
metic for computing lower bounds we round down and 
in arithmetic for computing upper bounds we round up. 
This provides a. guarantee that NE(e, B) contains the 
actual range of values of e independent, of numerical er¬ 
rors. This guarantee is often expressed as the statement, 
that, interval evaluation is “conservative” . 

We will assume a. given set. of constraints of the form 
e < 0 where e is an expression as defined above. An 
equation e\ = eo can obviously be represented with two 
constraints of the form e\ — eo < 0 and e^ — e\ < 0. For 
technical reasons it. will be easier to work with inequali¬ 
ties. 


2.1 Newton Cuts 

A Newton cut. is represented schematically in figure 1. 
The notation used in the figure is defined rigorously in 
the following description. In the Newton cut., and in 
later cuts, we abuse notation and use de/dx to denote 
the expression that, is the symbolic derivative of e with 
respect, t.o x. 

Newton Cut: Let. e < 0 be a. constraint., B a. 
given box, x a. variable appearing in e, and let. 

[a, 6] be the interval B(x). Let. c and c' be 
two cut. values such that, c < b and c' in [c, b ]. 

Let. s be the lower bound of NE(e, B[x := 
[c',c']]). We require s > 0. Let. [D\, D u ] be 
the interval NE(de/dx, B[x := [c, 6]]). To 
ensure there is no solution with x £ [c / , b] we 
require that, either Di > 0 or Dj < 0 and 
c' — > b. If D u < Owe can infer x < c and 

if D u > 0 we can infer x < max(c, c' — ). 

The lines of slope D u and Dj in the figure 1 represent, 
the fastest, possible rate of descent, of the value of the 
expression e as a. function of x based on the interval eval¬ 
uation of the derivative de/dx. It. should be clear that, 
all values of x inside the triangle defined by these lines 
must, violate the constraint, e < 0 and can be pruned. 

An interesting special case is when c = a and c' = b. 
Whenever the lower bound of NE(e, B[x := b,b]]) is 
greater than zero the upper bound b can be ruled out. as 
a. value, of x. If we are'working with a. finite box, i.e., 
one where all variable intervals are finite, and ignoring 
roundoff errors, D u must, be finite and the choice of c = a 
and c' = b always achieves some reduction of the upper 
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bound. However, larger cuts are usually achieved by 
other choices of c and d. Increasing the value of c reduces 
the size of the interval [c, 6] and hence reduces the range 
of possible derivative values NE(de/dx, B[x := [c, &]]). 
This often gives a smaller value for D u and hence a larger 
cut. 

Decreasing the value of d away from b can also give 
a larger cut. As in the statement of the cut, let s(c') be 
the lower bound of NE(e, B[x := [d, c']]), as a function 
of d. In practice a reduction in the upper bound can 
only be achieved when s(b) > 0, i.e., the value b can be 
ruled out. As d is reduced from b, the lower bound s(c') 
will also typically be reduced. However, it typically falls 
more slowly than D u , which is the largest possible value 
of de/dx in the interval [c, b]. If ds/dd is smaller than 
D u then reducing d away from b will result in a larger 
cut. However if we reduce d too aggressively then the cut 
may fail altogether either because we fail to get s(c') > 0 
or we fail to rule out feasible points with x £ [d, b]. 

The above cut allows for the reduction of upper 
bounds. We call it an upper bound cut. Every upper 
bound cut has a dual lower bound cut. We will only 
present upper bound cuts. 

2.2 Snake Cuts 

Consider a given constraint e < 0, a given variable x 
appearing in the constraint, a given box B, and let [a, b] 
be the interval B(x). Now consider a value d in the 
interval [a, b] and consider the lower bound s of the 
interval NE(e, B[x := [c',c']]) as a function of d. We 
will write s(c') to express s as a function of d for fixed 
values of e and B. The snake cut is identical to the 
Newton cut except that de/dx is replaced by ds/dd. 2 
It turns out that the range of values of ds/dd over the 
interval [c, 6] is often significantly less than the range of 
possible values of values of de/dx. This fact allows cuts 
to be larger. 

In order to replace de/dx by ds/dd we need to first 
represent the function s(d) as a symbolic expression s(T) 
involving only the variable x. To do this we first sym¬ 
bolically rewrite e as a polynomial P(x) of the form 
so + s\x + -sox 2 + . . . + s„x n where each coefficient s 8 - 
is an expression not involving x. We then define two 
polynomials in x, P~(x) and P + (x), each of which has 
constant coefficients (and hence no variables other than 
x). P~ (x) gives a lower bound on e for negative values of 
x and P + (x) gives a lower bound on e for positive values 
of x. For odd powers of x the coefficients of P~ (x) are 
taken to be the upper bound of NE(.Si, B) where is 
the corresponding symbolic coefficient of the polynomial 
P(x). In all other cases the coefficients of P~ (x) and 
P + (x) are taken to be the lower bound of NE(si, B). 
A function s(*) can now be represented symbolically by 
the conditional expression if (a; > 0, P + {x), P~(x)) 
which has the property that it equals P + (x) for posi¬ 
tive values of x and P~(x) for other values of x. The 
interval evaluation function NE and the symbolic differ¬ 
entiation operators can be easily extended to handle con- 

2 In informal discussions we began calling the function 
s(d) a “snake” and the cut based on an analysis of this func¬ 
tion became known as the snake cut. 


ditional expressions. The symbolic expression s(*) does 
not correspond exactly to the function s(d) as defined 
above because rewriting an expression as a polynomial 
in x changes the interval returned by interval evaluation. 
However, the symbolic expression s(*) does give a valid 
lower bound on the values of e for points in the box B 
as a function of the value for x. 

Snake Cut: Let e < 0, x, B, [a, b] and the 
symbolic expression s(*) be defined as above. 

Let c and d be two additional parameters as 
in the Newton cut with c < b and d in [c, b]. 

We require that s(d) > 0. Let [£);, D u \ be 
the interval NE(ds/dx, B[x := [c, &]]). To 
ensure there is no solution with x £ \d, b] we 
require that either Dj > 0 or Dj < 0 and 
d ~ ^ D u < Owe can infer x < c and 

if D u > 0 we can infer x < max(c, d — ^-1). 

To see the advantage of the snake cut over the Newton 
cut consider the expression zx — 2 < 0 where the box B 
assigns both x and z the interval [1, 4]. It is not difficult 
to see that since z is in [1, 4] and zx < 2 we have x <2. 
We compare a Newton cut with a snake cut where we 
take c = 1 and d = 4 in both cuts (i.e., c = a and 
d = b). In the Newton cut we have that NE(de/dx, B) 
equals NE(z, B) which is [1, 4]. So the infered bound 
is x < 4 — | = 3^. However the lower bound polynomial 
s(a:) (for positive values of x) is x — 2. So NE(ds/dx, B) 
equals NE(l, B) equals [1, 1] and the infered bound is 
x < 4 — I = 2, an optimal cut. 

Unfortunately snake cuts require rewriting the con¬ 
straint in a way that makes interval evaluation less ac¬ 
curate. In our experience snake cuts are most useful in 
very large boxes and Newton cuts become more useful 
as the boxes get smaller. 

2.3 Combination Cuts 

The combination cut is used to gain some of the power of 
multivariate Newton’s method once the search has begun 
to converge on a solution. Interval versions of multivari¬ 
ate Newton’s method are widely used in interval systems, 
e.g., [8, 7]. The Newton cuts we use involve two ideas. 
First, we compute combined constraints by inverting the 
“center Jacobian” matrix of the constraint system. Sec¬ 
ond, in the combination cuts we use a different method 
of interval evaluation called Taylor evaluation which is 
more accurate for small boxes. 

Like the natural evaluation operator NE, Taylor in¬ 
terval evaluation takes an expression e and a box B and 
returns an interval containing the range of values of e 
on points in B. Taylor interval evaluation tends to be 
more accurate in the case where B is small. To define 
the Taylor interval evaluation we first define B c to be the 
center of the box B, i.e., B c (x) is the interval [c, c] where 
c is the center of the interval B(x). We use the notation 
X{ £ e to indicate that X{ is a variable appearing in the 
expression e. 

TE(e, B) = NE(e, B c ) + ^ ( NE(de/dx t , B) * (B(x t ) - B c (x t )) 
x t Ee 
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All arithmetic operations in the above expression are 
operations on intervals. Taylor interval evaluation is 
more accurate than natural interval evaluation in small 
intervals. For example, consider the polynomial x 2 — x 
where B(x) is the interval [x c — e, x c + e] where x c > ^ 
and e is a small positive number. We have 

NE(x 2 c -x c , B) = [(x 2 -x c )-(2x c + l)e+e 2 , (x 2 -x c ) + (2x c + l)e+ e 2 ] 
TE{x 2 -x c , B) = [(x 2 -x c )-(2x c -l)e+2e 2 , (x 2 - x c ) + (2x c - 1 )e + 2 2 ] 

For small values of o the width of the first interval is 
approximately 2(2x c + l)e while the width of the second 
interval is approximately 2(2x c — l)e. Since we have as¬ 
sumed x c > i the first interval is wider than the second. 
(If we assume x c < ^ then the interval product in the 
Taylor interval calculation gives a different symbolic an¬ 
swer and the Taylor interval is still smaller.) The natu¬ 
ral evaluation does not take into account the correlations 
between the two terms in the polynomial x 2 — x. More 
generally, if P(x) is a polynomial in x and B(x) is an in¬ 
terval of the form [x c — e, x c + e] then to first order in o we 
have that the width of TE(P(x), e) is 2P'(x c )e where P' 
is the symbolic derivative of P. P'(x c ) is nonzero then as 
e goes to 0 the ratio of the width of TE(P(x ), B ) to the 
width of the true range of values of P(x) approaches 1 
— Taylor interval evaluation becomes exactly accurate 
in the limit of small intervals. As the above example 
shows, this is not true of the natural evaluation. 

Combination cuts build new constraints which are lin¬ 
ear combinations of given constraints. If we are given 
e\ < 0 and 02 < 0 then for any positive a and [3 we can 
infer ae\ + f3e 2 < 0. A combination cut is identical to 
a Newton cut except that it uses a constraint derived as 
a linear combination of input constraints and uses TE 
rather than NE in the computation of s. 

Combination Cut: Let x be a variable, B a 
box, and [a, b] the interval B(x). Let c and c' 
be two additional parameters with c < b and 
c' in [c, b]. Let e\ < 0, 02 < 0, . . ., e n < 0 be 
a set of constraints such that each e 8 - contains 
x. Let u be a linear combination 0 . 20.2 + 

. . . + a„e„ where each ay is positive. Let s be 
the lower bound of TE(u, B[x := [c', c']]). 

We require s > 0. Let [Di, D u \ be the in¬ 
terval NE(du / dx, B). To ensure there is no 
solution with x £ \c', 6] we require that either 
Di > 0 or Di < 0 and o' — > b. If D u < 0 

we can infer x < c and if D u > 0 we can infer 
x < max(c, o' — jj-)- 

To see the power of the combination cut consider the 
two constraints x + y — 2 < 0 and x — y < 0. If both 
B(x) and B(y) are the interval [0, 2] then no Newton 
or snake cut can be used to improve the bounds — each 
constraint is “satisfied at the end points”. For example 
NE(x + y — 2, B[x := [2, 2]]) is [0, 2] so x = 2 can not be 
ruled out on the basis of this constraint alone. To reduce 
the box B we must examine both constraints simulta¬ 
neously. Adding the two constraints together (with unit 
coefficients) gives 2x — 2 < 0. This immediately gives the 
tighter bound x < 1. If we are given the four inequality 


constraints representing x + y = 2 and x — y = 0 then 
four combination cuts can be used to solve for x and y 
exactly. A method for finding appropriate combination 
cuts is described in the next section. 

3 Branch and Propagate Root Finding 

To find a feasible point for a set of inequality constraints, 
or a root to a set of equations, one can use a simple 
branch and propagate procedure. One starts with a box 
large enough to contain all points of interest. One then 
reduces the box by applying cuts to derive new bounds. 
At some point one may decide to branch. To branch one 
selects a variable x. Let [a, b] be the interval B(x) where 
B is the box in use at the time of the branch. One then 
“splits” the box into the two branches B[x := [a, ^^-]] 
and B[x := [^^-,6]]. One then recursively computes a 
solution inside the first box, or if there is no such solu¬ 
tion, a solution inside the second box. The procedure 
terminates when the current box is sufficiently small, 
e.g., every variable is assigned an interval of width less 
than, say, 10 -8 . Newton uses a simple round-robin strat¬ 
egy to select the variable to split — down any branch of 
the search tree Newton splits a variable that has gone 
the longest time without splitting. 

The basic branch and propagate procedure can be 
used with a wide variety of heuristics for folding useful 
cuts and for determining when to branch. Our experi¬ 
ence with Newton indicates that one useful heuristic is 
to only perform cuts that result in at least a 10% re¬ 
duction of an interval. This ensures that the number of 
cuts in a single propagation phase is at worst linear in 
the number of variables in the constraints. However, this 
same heuristic will force the procedure to branch before 
all possible cuts have been exhausted. In our experience 
such “early branching” is usually desirable. 

Now consider selecting the values of c and o' in a cut 
on variable x with interval [a, b], A simple strategy is 
to select c to be b — for some non-negative integer n 
and to select o' to be the midpoint of [c, b]. In practice 
we need only consider values of n up to 3 since we are 
only interested in cuts which reduce the interval by at 
least 10%. For a given variable we can start with n = 0 
and if that cut fails increase n (up to 3) looking for a 
viable cut. 

Combination cuts require selecting not only c and o' 
but also a linear combination of the constraints. Be¬ 
cause computing appropriate coefficients for the linear 
combination can be expensive we suggest performing all 
possible Newton and snake cuts before attempting com¬ 
bination cuts. In performing combination cuts we as¬ 
sume that the input is given as a set of equations of the 
form Oi = 0. This allows one to compute combined con¬ 
straints of the form u x = 0 where each u x is a linear 
combination of the e; and where u x is intended to con¬ 
strain x independent of the value of other variables. To 
compute the combined constraints u x let B be the box 
existing at the time we are computing the combined con¬ 
straints. We define the “center Jacobian” matrix [5; j] 

by 

Bij = the midpoint of the interval NE(dei / dxj , B). 
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Assuming that the matrix [-Bgj] is nonsingular one can 
compute [Aij\ as of [-Bgj] -1 - We then define the expres¬ 
sion u Xt by 

n 

^ AjjcGk. 

k = 1 

Inside the box B, and especially for reasonably small 
boxes, we have that du x Jdxj is near 1 if i = j and near 
0 otherwise. So u x = 0 is primarilly a constraint on 
x. Once the combined constraints of the form u x = 0 
have been computed we can perform combination cuts 
on this fixed set of combined constraints selecting c and 
d as specified above. In our experience it is best to 
perform only a single matrix inversion in any one propa¬ 
gation phase of the procedure — once the coefficients of 
the combinations constraints have been computed they 
should remain fixed throughout the remainder of that 
cutting phase. If the initial box is reasonably small, 
propagation relative to a fixed set of combined con¬ 
straints will converge on an exact solution and only a 
single matrix inversion is needed. When the heuristics 
for selecting cuts fail to find any viable cuts the proce¬ 
dure branches. 

The precise details of Newton’s procedure for selecting 
cuts can be found in [4]. Although these details may be 
required to exactly replicate Newton’s behavior, it seems 
likely that the qualitative performance of the system can 
be duplicated using only the heuristics outlined here. 

4 Conclusion 

This paper describes a branch and propagate algorithm 
to find all isolated solutions for a system of polynomial 
equations over the reals. Our techniques were originally 
inspired by local constraint propagation techniques used 
in AI and logic programming. The mathematics behind 
our techniques is quite straightforward and far less so¬ 
phisticated than that underlying continuation methods. 
Yet our techniques seem to compare well with continua¬ 
tion methods on a wide variety of benchmarks. 
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